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Protein-protein interactions can be properly modeled as scale-free complex networks, while the 
lethality of proteins has been correlated with the node degrees, therefore defining a lethality- 
centrality rule. In this work we revisit this relevant problem by focusing attention not on proteins 
as a whole, but on their functional domains, which are ultimately responsible for their binding 
potential. Four networks are considered: the original protein-protein interaction network, its ran- 
domized version, and two domain networks assuming different lethality hypotheses. By using formal 
statistical analysis, we show that the correlation between connectivity and essentiality is higher for 
domains than for proteins. 

PACS numbers: 89.75.Fb, 02.10.Ox, 89.75.Da, 87.80.Tq 



A great deal of the functionality of proteins stems 
from their ability to dock, i.e. to connect. Such dock- 
ings are highly specific and depend on geometrical and 
field compatibilities between the involved proteins. More 
specifically, the docking sites of a protein are largely de- 
fined by the presence of specific domains^ i.e. portions 
of aminoacid sequences along the protein primary back- 
bonei. Given that protein-protein interactions involve 
physical interactions between protein domains, domain- 
domain interaction information can be particularly useful 
for validating, annotating, and even predicting protein 
interactions. The subject of protein domain interaction 
has been covered in previous investigations^. 

Protein-protein interaction networks are obtained by 
representing each protein as a node and each possible 
docking between pairs of proteins as edges linking the 
respective nodes. Domain-domain interaction networks 
are constructed considering protein complexes, Rosetta 
Stone sequences, and by using protein interaction net- 
worksii2ii.. The current work considers the last approach, 
taking into account domain subnetworks contained in 
protein-protein interaction networks. This method al- 
lows not only the direct visualization of the coexistence 
of domains and proteins, naturally providing for the mul- 
tiplicity of domains, but also the objective quantification 
of interactions between domains. 

Given a network, the degree of a specific node is de- 
fined as the number of connections between that node 
and the remainder of the network. This frequently used 
measurement can be generalized to express the connec- 
tivity not of a single node, but of a whole subnetwork 
contained in the original network^. Subnetworks can be 
obtained by selecting a subset of nodes from the origi- 
nal network as well as the edges between those nodes. 
The degree of a subnetwork is then defined as the num- 
ber of connections between its nodes and the remainder 
of the network nodes, not taking into account the con- 
nections internal to the subnetwork. By quantifying the 
number of interactions between the subnetwork and the 
overall structure, the subnetwork degree provides a valu- 
able indication about the role and importance of each 
subnetwork. 



One particularly interesting way to define a subnet- 
work is by selecting among the nodes in the original net- 
work those that exhibit some specific feature. Consider- 
ing a protein-protein interaction network, a subnetwork 
can be obtained by selecting those nodes that contain 
one or more instances of a specific protein domain. Note 
that such a subnetwork is embedded within the origi- 
nal protein-protein interaction network. A whole collec- 
tion of subnetworks can then be obtained, one for each 
considered domain, and valuable insights about the im- 
portance and role of the domains can be inferred by us- 
ing the concepts of subnetwork degree and subnetwork 
hubs. We applied such concepts to the Saccharomyces 
cerevisiae protein-protein interaction networks using the 
non-redundant database of interacting proteins by Sprin- 
zak et ai^ (formed by 4, 135 proteins and 8, 695 connec- 
tions). The respective domains were identified by using 
the Pfam database^, which contains a large collection of 
multiple sequence alignments and profile hidden Markov 
models (HMM) covering the majority of protein domains, 
yielding a total of 1,424 domains. Figure Q shows part 
of such a network, where four domain subnetworks are 
identified in black circles, white squares, black squares 
and black diamonds. 

In order to investigate the relationship between domain 
connectivity and lethality, it is necessary to extend the 
concept of essentiality to domains. However, as there 
is no consensus about domain lethality in the scientific 
literature, we suggest the two following hypotheses: 

• I. Domain lethality in a weak sense: a domain is 
lethal if it appears in a lethal protein. 

• II. Domain lethality in a strong sense: a domain is 
lethal if it appears in a single-domain lethal protein. 

The first definition is considered weak because a lethal 
domain can appear in lethal and viable proteins simul- 
taneously. However, that assumption is still potentially 
interesting because co-occurring domains are more likely 
to exhibit similar function or localization than domain 
in separate proteins^'^, which suggests that lethal pro- 
teins may involve uniformly lethal domains. The second 
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FIG. 1: Partial representation of the S. cerevisiae protein- 
protein interaction network, with four domain subnetworks 
identified in black circles, white squares, black squares and 
black diamonds. The degree of the latter can be obtained 
by counting the number of edges between the black diamond 
nodes and the other nodes of the structure, yielding total de- 
gree equal to 3. Note that the presence of overlap between the 
white squares subnetworks and black squares subnetworks, 
characterized by the five black/white squares indicates the 
presence of two different domains in the proteins associated 
to those nodes. 



TABLE I: Statistical values for protein and domain networks. 
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Proteins 


4,135 


795 


2.10 


38 
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0.12 


0.10 


Domains in weak sense 


1,424 


499 


2.24 


22 


2.5 


0.61 


0.61 


Domains in strong sense 


818 


243 


1.27 


15 


2.6 


0.73 


0.77 



hypothesis, on the other hand, is considered strong be- 
cause if the domain is the only one in a lethal protein, 
it must be responsible for the protein's essential func- 
tion. When working with the first assumption, the whole 
protein interaction network is studied; for the second as- 
sumption, only the subnetwork formed by proteins with a 
single domain is considered. It is important to note that 
the two lethality situations above are just hypotheses to 
be checked against the experimental results reported by 
Jeong et alA concerning protein-protein interaction net- 
works. In other words, eventual identification of high 
correlation between degree and lethality for one of those 
hypotheses could be understood as supporting that re- 
spective assumption due to the centrality -lethality rule^ 
which is widely believed to reflect the special importance 
of hubs in organizing the network, and the biological sig- 
nificance of network architectures, a key notion in sys- 
tems biology Figure 121 shows the histogram of the cu- 
mulative protein degree and domain subnetworks degrees 
in both weak and strong senses. The cumulative degree 
distribution for all networks follows a power law with 
an exponential cut-off (finite size effect^^) described by 
P(k) ^ (k -h ko)^e-^^+^o)/^-^ The values of kc and 7 ob- 
tained from the cumulative distribution for proteins and 
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FIG. 2: The cumulative degree distribution of pro- 
teins and domain subnetworks in the considered S. cere- 
visiae follow a power-law with an exponential cut-off, 
P(k) ^ (k + ko)^e"^^+^o^/^^ represented by the continuous 
line. The values of kc and 7 for proteins and domains are 
presented in Table 

domains are presented in Table U 

Figure |31 shows the relationship between degree and 
essentiality for the protein and domain networks. The 
lethality of proteins was determined using the MIPS 
databasei^ and the number of lethal protein, Nl^ for the 
considered networks is shown in Tabled The abscissae 
represents the node degree k of proteins or domains^ 
limited by the cut-off (see Figure whose values are 
presented in Table ffl while the ordinate axis expresses 
the fraction of lethal proteins or domains among the 
ones with degree k. In order to determine the correla- 
tion between the fraction of lethal proteins/domains and 
their degree, we estimated the Pearson correlation co- 
efficient ^^,r, which measures the strength of linear rela- 
tionship between two variables, and the Spearman rank 
correlation coefficienU^ii^, p, which is a nonparametric 
coefficient used in case of nonlinear relationships. Ta- 
ble ^ presents the values of correlations for proteins and 
domains, which indicates that the correlation between 
lethality and degree is larger for hypotheses (I) and (II) 
than for whole proteins. The statistical significance of the 
correlations was tested by applying the Fisher's compar- 
ison correlation coefficient test^^, p. The comparison be- 
tween the correlation coefficients of proteins and domains 
in weak sense results p < 0.035 for r and p < 0.001 for p. 
The comparison of proteins with domains in weak sense 
yields p < 0.015 for r and p < 0.001 for p. These results 
lead to the conclusion that the domains correlations in 
both weak and strong sense are significantly higher than 
the correlation obtained for proteins. 

Since protein domains represent the basic evolution- 
ary units that form proteins, it is not surprising that 
domains should play a fundamental role in the definition 
of proteins interaction and lethality^. In this way, the 
obtained results indicate that the interactions between 
proteins may be defined at the domain level, with the 
importance of domains being associated to their func- 
tions^. As hubs tend to be the most important nodes 
in networks, domains with larger number of connections 
should be particularly fundamental (essential) for net- 
work maintenance. Indeed, domains with a high number 
of connections act as interconnecting pathways in the net- 
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FIG. 3: Fraction of lethal proteins and domains with a par- 
ticular degree for (a) proteins, (b) domains in the weak sense, 
(c) domains in the strong sense and (d) the network using 
protein permutations (see the text for details). 



work (also called backbones) which, when removed, im- 
ply substantial network diameter increasal (as discussed 
by Jeong et al.^). The special importance of domain es- 
sentiality can be readily inferred by inspecting the re- 



sults presented in Figure Eland Table U Further, lethal 
domains are more likely to be hubs than lethal proteins. 
In other words, both hypotheses about domain lethality 
have been supported by the experimental results, with 
the strong sense hypothesis resulting more definite than 
the weak sense counterpart. 

In order to verify whether the distribution of domains 
among proteins influences the domains connectivity, we 
randomized the protein positions along the network while 
maintaining the network structure, which was done by 
permutations of the proteins assigned to the nodes. Thus, 
for 100 randomized network versions (see in Figure|3fd)), 
the correlations obtained are close to zero (see Table m, 
confirming that the relation between the connectivity and 
lethality is unlikely to be a spurious effect. 

The results presented here suggest a novel fundamen- 
tal relationship between protein and domain interaction 
which has several implications for future works, as vali- 
dating, annotating, and even predicting protein interac- 
tions and lethality. Also, our results can be used as a 
pre-investigation to obtain experimental data about do- 
main interaction and lethality. 
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The domain degree is normalized by the number of proteins 
present in the subgraph of the respective domain so as to 
avoid artificially high degree otherwise induced by more 
abundant domains, which would bias the results. 



